#Description of the Data ## listings = name, host, host name, neighborhood, neighborhood group, latitutde, longitutde, room type, price, minimum # of nights, number of reviews, last review, reviews per month, host listing number, availability, number of reviews, license ## reviews - listing id, id , date, reviewer id, reviewer name, comments ## calendar - details about booking for the next year about listing, date, available, price, minimum nights
1836 unique listings are provided for the washington DC area
over 30,000 reviews have been left from November 2010-December 2021
The main topics of data that I will be analyzing
Demand and Price Analysis
User Review
other interesting things
possible data analysis
Spatial Analysis
This section is looking at the various location of Airbnbs in DC
NOT WORKING
Demand and Price Analysis
Look at the demand over the years since the beginning Airbnbs in the DC area
look at the relationship of price vs demand = do prices of listings flucutate with demand, how do prices vary by days of the week
To find the demand, will use number of reviews as the indicator for demand
need to change dates in reviewsNUM to just be the year, so then the dates are confused the bottom ticks can be made into just years
How popular has Airbnb become in Washington DC?
NEED TO FIX GRAPH WITH YEARS AND ANGLES

How is Airbnb priced across the year?
We wanted to see if the pricing of the postings followed a similar trend after seeing the pattern in demand.
To address the aforementioned issue, we used the data from the ‘calendar’ table to look at the daily
average prices of the listings through time.

As the year advances, the average price of all listings tends to rise, peaking in December. Except in November and December, when the number of reviews (an indicator of demand) begins to fall, the pattern is identical to that of the number of reviews/demand. This appears to be counter-intuitive, as one would anticipate the price to fall as demand falls. This could be due to our assumption that the quantity of reviews reflects demand, which isn’t always the case.
On the above graphs, we can also notice two sets of points indicating that average prices on certain days were greater than on other days. To further comprehend this phenomena, we’ll create a box plot showing average costs by weekday.

We can see that Fridays and Saturdays have a higher concentrated price for the renting on the weekends.
Occupancy Rate by Month
I’ll end this section’s examination by looking at the occupancy forecast for the coming year.
We will use the table ‘calendar’ to determine the % occupancy for the next year, i.e., what
proportion of appartments have already been booked as of November 3, 2018 (the day the data
was obtained). We were unable to get historical occupancy data and, as a result, were unable
to investigate what the real occupancy rates are.
THIS SECTION NOT WORKING STILL NEED TO DOWNLOAD makeR
USER REVIEW (TEXTUAL DATA) MINIG
Building word vectors from Reviews
The previously constructed word cloud is effective at locating what clients are looking for, but it is quite broad. Isn’t it wonderful if we could find out what people think about the room sizes? Why don’t you investigate what makes consumers “uncomfortable”?
These represent the good words pulled from the reviews.

Now analyzing the demand and Supply: Airbnb Customer Growth vs Listing Prices Over time

Some code has been copied from “author: Ankit Peshin, Sarang Gupta, Ankita Agrawal”
Comment analysis using word cloud
Let’s start by looking at the most common topics in the reviews; just creating a word cloud should enough. Wordclouds take a frequency count of the words in the corpus as input and produce a visually appealing representation of dominating (often occurring) words, with their size proportionate to their frequency. We have over a million reviews, thus we need to take a random sample, in this case 30,000 reviews. Despite the fact that the sampled dataset is minimal in contrast to the original, it meets our purpose well because we just need the basic terms here. As we’ll see in the next section, further study of “good” and “negative” reviews will require more data.
#These are the most words associated with uncomfortable